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(54) A method and apparatus for displaying references to a user's document browsing history 
within the context of a new document 



(57) An electronic document reader supplements a 
new document with selectable links that reference por- 
tions of previously read documents that have content 
similar to a passage of the new document. The portions 
of the previously read documents may be previously 
identified and stored in a memory to expedite process- 
ing. The portions of previously read documents are iden- 
tified by annotations or explicitly by a user. The identified 
portions are indexed, clustered and are used as proxies 
for topics. This invention segments a new document into 



passages and matches the passages to the stored top- 
ics based on the similarity of the content, "me topics that 
exceed a content similarity threshold cause correspond- 
ing selectable links to be displayed in the display of the 
new document near the corresponding passage. The 
user of this invention can then choose to follow the se- 
lectable link to learn more about the topic of the corre- 
sponding segment. In this manner, this invention aids a 
reader in connecting material in the new document with 
material in previously read documents. 
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Description 

[0001] This invention relates to electronic document 
reading systems. More particularly, this invention re- 
lates to electronic document reading systems that sup- s 
plement a document with links to related portions of pre- 
viously read and annotated documents. 
[0002] When a person reads documents tor compre- 
hension, the reader establishes relationships between 
the document currently being read and other, previously io 
read, documents. These relationships are then used by 
the reader to decide whether the new document should 
be read in detail (for example, because rt complements 
previously read information) or if it should be skipped 
because it only duplicates the previously read doco is 
ments. 

[0003] Conventional systems and methods of con- 
veying document interrelationships to the reader typi- 
cally rely on graphical visualizations of documents 
where each document is represented by a point. One 20 
such system is described in "Bead: Explorations in in- 
formation Visualization 0 , M. Chalmers et al., Proceed- 
ings of SIGIR "92, pp. 330-337, ACM Press (1992), in- 
corporated herein by reference in its entirety. Document 
relationships can also be represented as a geometric 25 
shape. One such system is described in "Visualizing Cy- 
berspace: Information Visualization in the Harmony In- 
ternet Browser", K. Andrews, Proceedings of the IEEE 
Information Visualization Symposium '95, pp. 97-104, 
IEEE Press (1 995), incorporated herein by reference in 30 
its entirety. While these techniques efficiently represent 
very large numbers of documents, they do not effective- 
ly represent the content of those documents. By sup- 
pressing the docwrient's content when displaying rela- 
tionships, these systems make it difficult for a reader to 35 
understand why and how the documents are related. 
[0004] Remembrance Agent displays a fist of docu- 
ments that are related to the user's current context while 
the user enters text This system is described in °A Con- 
tinuously Running Automated Information Retrieval *o 
System 0 , B. Rhodes et al., Proceedings of the First In- 
ternational Conference on the Practical Application of 
intelligent Agents and Multi Agent Technology (PAAM 
•96), pp. 487-495 (1996), incorporated herein by refer- 
ence in its entirety. However, Remembrance Agent does 4S 
not encourage reading or browsing, because the sug- 
gestions are ephemeral and thus disappear when addi- 
tional text is entered. Furthermore, Remembrance 
Agent does not consider the user's interests. No prefer- 
ence is given to passages that the user found interesting so 
or relevant over other, potentially irrelevant, portions of 
documents. 

[0005] Bookmark Organizer presents a hierarchical 
list of documents, but leaves it to the user to find the 
appropriate documents. This system is described in 65 
"Automatically Organizing Bookmarks Per Contents", Y. 
Maarek et al., Computer Networks and ISDN Systems, 
28 pp. 1321-1333 (1996), incorporated herein by refer- 



ence in its entirety. In addition, although bookmarks 
identify the potentially interesting documents, they do 
not identify the passages that are of specific interest. 
[0006] There is thus a need for an electronic docu- 
ment reader that supplements new documents with se- 
lectable links to relevant and annotated portions of pre- 
viously read documents. 

[0007] This invention provides a system and method 
that automatically constructs relationships among seg- 
ments of different documents. Portions of a document 
that have been identified as being interesting to a user 
are extracted from previously read documents. These 
portions may have been explicitly identified to the sys- 
tem by the reader or the relevance of the portion may 
be inferred by the system based upon cues, such as an- 
notations, made to the documents by the user. The iden- 
tified portions, or "surrogates", are indexed and linked 
to the original documents and are used as proxies for 
the users various interests, The portions are clustered 
based upon their relatedness to each other. Therefore, 
each cluster of portions relates to a topic. 
[0008] When a new document is opened it is seg- 
mented into passages and the passages are compared 
to the portions from the previously read documents. If a 
passage of the new document is identified as being 
closely related to a portion then a selectable link is pro- 
vided in the new document to the old document from 
which the identified portion originated. The user may 
then choose to select the selectable link to the old doc- 
ument to read the portion of the old document to en- 
hance understanding of the new document by reminding 
or refreshing the understanding of the reader. In this 
manner a user's understanding of a new document is 
enhanced. 

[0009] These and other features and advantages of 
this invention are described in or are apparent from the 
following detailed description of the preferred embodi- 
ments. 

[0010] The preferred embodiments of this invention 
will be described in detail, with reference to the following 
figures, wherein: 

Fig. 1 is a block diagram of one embodiment of the 
electronic document reader of this invention; 
Fig. 2 is a flow chart outlining how the portions are 
formed and stored; 

Fig. 3 is a flowchart outlining the control routine of 
one embodiment of the method of this invention; 
Fig. 4 shows a display of a new document with se- 
lectable links to previously-read documents accord- 
rig to this invention; 

Fig. 5 shows a display of a previousry-read docu- 
ment that is referenced by a selectable link in the 
display of the new document shown in Fig. 4; 
Fig. 6 shows a display of another previously-read 
document that is referenced by a second selectable 
link in the display of the new document shown \n 
Fig. 4; and 
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Fig. 7 is a block diagram of one embodiment of the 
processing system of this invention. 

[0011] Fig. 1 shows one embodiment ol an electronic 
document reading system 1 0 of this invention. The elec- 
tronic document reading system 10 includes a proces- 
sor 1 2 communicating with a first memory 1 4 that stores 
previously read and annotated documents 1 6 and a sec- 
ond memory 18 that stores a new document 20 that is 
currently being read and displayed for a user on a dis- 
play 22. The processor 12 also communicates with a 
third memory 21 that stores "surrogates", i.e., portions, 
23 ofthe previously-read documents 16. The processor 
1 2 controls the display 22 to display the new document 
20 to the user of the electronic document reading sys- 
tem 10. The processor 12 also communicates with an V 
O interlace 24 that, in turn, communicates with any 
number of conventional I/O devices 26, such as a key- 
board 28, a mouse 30 and a pen 32. The I/O devices 26 
are operated by a user to control the operation of the 
electronic document reading system 10. 
[001 2] As shown in Fig. 1 , the system 1 0 is preferably 
implemented using a programmed general purpose 
computer. However, the system 10 can also be imple- 
mented using a special purpose computer, a pro- 
grammed microprocessor or microcontroller and any 
necessary peripheral integrated circuit elements, an 
ASIC or other integrated circuit, a hardwired electronic 
or logic circuit such as a discrete element circuit, a pro- 
grammable logic device such as a PLD f PLA, FPGA or 
PAL, or the like. In general, any device on which a finite 
state machine capable of implementing the flowcharts 
shown in Figs. 2 and 3 can be used to implement the 
system 10. 

[001 3] Additionally, as shown in Fig. 1 , the memories 
14, 18 and 21 are preferably implemented using static 
or dynamic RAM. However, the memories 1 4, 1 8 and 21 
can also be implemented using a floppy disk and disk 
drive, a writable optical disk and disk drive, a hard drive, 
flash memory or the like. Additionally, it should be ap- 
preciated that the memories 14, 18and21 can be efther 
distinct portions of a single memory or physically distinct 
memories. 

[0014] Further, it should be appreciated that the links 
15, 17 and 19 connecting the memories 14, 18 and 21 
to the processor 1 2 can be a wired or wireless link to a 
network (not shown). The network can be a local area 
network, a wide area network, an intranet, the Internet, 
or any other distributed processing and storage net- 
work. In this case, the electronic document 20, the pre- 
viously read and annotated documents 16 and the doc- 
ument surrogates 23 are pulled from physically remote 
memories 14, 18 and 21 through the links 15, 17 and 19 
for processing in the system 1 0 according to the method 
outlined below In this case, the electronic document 20, 
the previously read and annotated documents 16 and 
the document surrogates 23 can be stored locally in 
some other memory device of the system 10 (not 



shown). 

[0015] The method of this invention relies on at least 
two subprocesses. The first process maintains a list of 
document portions 23 and the second process matches 

s the document portions 23 to passages from the new 
document 20. The results of any matches are displayed 
to the reader as selectable links in the display ofthe new 
document 20 in proximity to the matching passages of 
the new document 20. 

w [0016] A third optional subprocess clusters the docu- 
ment portions 23 based upon their relatedness to each 
other. Each cluster then approximates an identification 
of a topic. The clustering speeds up the processing be- 
cause the clustering bwers the number of portions to 

15 be compared to the new document. The attributes of the 
clusters are used to compare to the passages of the new 
document and, once a cluster is identified, the portions 
within the identified cluster are analyzed. In this manner, 
the number of portions that are analyzed are greatly re- 

20 duced because only the portions within an identified 
cluster are analyzed rather than all portions. 
[0017] It should be appreciated that, these subproc- 
esses will generally be running concurrently in the back- 
ground. In particular, as the new document 20 is read 

25 and annotated by the user, the subprocess outlined in 
Fig. 2 generates new portions 23 to be used when read- 
ing a subsequent document. At the same time, when the 
new document 20 is opened, the subprocess outlined in 
Fig. 3 checks the portions 23 generated from previous 

30 documents 1 6 for relevance to the passages of the new 
document 20. 

[0018] Fig. 2 is a flowchart outlining howthe previous- 
ly-read documents 16 are analyzed to identify, store and 
cluster their portions. Preferably, the previously read 

35 documents 1 6 have been annotated by the user so that 
the surrogates 23 which the user found interesting can 
be identified and extracted into the memory 21 . Starting 
in step S1 00, the control routine continues to step S11 0, 
where the system segments the documents into por- 

40 tions. The control routnne then continues to step S1 20, 
where the portions having annotations are identified. 
Then, in step S130, the control routine stores the iden- 
tified ponions with the references to the underlying an- 
notated pages. Then, to step S140, the portions 21 are 

45 clustered using similarity metrics to identify major 
themes or topics of interest to the user. Similarity metrics 
are well known, and are described in, for example, "In- 
troduction to Modem Information Retrievar, G. Salton 
et al., McGraw-Hill, 1983, incorporated herein by refer- 

so ence in its entirety. A set of cluster attrtoutes are also 
determined for each cluster. Next, the control routine 
continues to step S150, where the control routine stops. 
[0019] Preferably, steps S100-S150 are performed 
continuously in the background as a user reads docu- 

55 ments to create an extensive set of clusters of portions 
of previously read documents. 

[0020] Fig. 3 is a flow chart outlining the control rou- 
tine of one embodiment ol the method of this invention. 
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Beginning in step S200, the control routine continues to 
step S21 0, where a new document 20 is segmented into 
passages. Then, in step S220. similarity measures or 
scores are determined between a selected passage of 
the new document 20 and each set of cluster attributes 
in the memory 21 . Next, in step S230, the control routine 
identifies those clusters that have similarity measures 
that exceed a predetermined or user-specified threshold 
or, alternatively, the system may identity the cluster with 
the highest similarity score. If the system identifies clus- 
ters) having a similarity score exceeding the threshold 
then control continues to step S240, where the system 
determines similarity scores for each portion in the iden- 
tified clusters). The control routing then continues to 
step S250. Step S250 of the system identifies those por- 
tions that have a similarity score exceeding a predeter- 
mined or user-specified threshold. Control then contin- 
ues to step S260, where the control routine displays, for 
each identified portion, one Ifrik to the appropriate old 
document and associates the generated links with the 
corresponding passage of the new document. Control 
then continues to step S270. Alternatively, a link to the 
document from which the portion having the highest 
similarity score may be generated in the new document. 
Lastly, if no similarity measures exceed the threshold in 
steps S230 or S250, then control jumps directly to step 
S270. 

[0021] In step S270, the control routine determines if 
any unchecked passages in the new document 20 exist 
If so, control returns to step S220, where the next pas- 
sage of the new document 20 is selected. Otherwise, 
control continues to step S280, where the control routine 
determines one of the selectable links of a currently 
displayed docwnent, such as the new document 20, has 
been selected. If one of the selectable links has been 
selected, then control continues to step S290. Other- 
wise if no selectable link is selected in step S280, the 
control routine jumps directly to step S300. In step S290, 
the corresponding old document 1 6 is displayed on the 
display 1 8 in place of the currently displayed document, 
such as the new document 20 or a previous old docu- 
ment 16. Preferably, the display is centered on the cor- 
responding portion of the old document 1 6. The control 
routine then continues to step S300 
[0022] In step S300, the control system determines if 
the user has closed the currently displayed document 
20 or 1 6. It not, control returns to step S280. Otherwise, 
control continues to step S31 0. 
[0023] In step S31 0, the control routhe determines if 
any document 16 or 20 remains open. If so, control re- 
turns to step S280. Otherwise, control continues to step 
S320, where the control routine stops. 
[0024] Figures 4-6 show the various documents and 
links displayed on the display 22 during the operation of 
one embodiment of the system of this invention accord- 
ing to one embodiment of the method of this invention. 
In Fig. 4, the display 22 shows to the user a new docu- 
ment 20, along with selectable links 34' and 34°. The 



selectable links 34' and 34" do not interfere or interrupt 
reading because the links 34* and 34° appear in a ma rgin 
of the new document 20. 

[0025] If the user selects one of the selectable links 

s 34* and/or 34°, then the display 20 displays the corre- 
sponding document 16*or 16°. For instance, if the user 
selects the selectable link 34\ which is labeled as 
"SAAL93", then, as shown in Fig. 5, the display 22 
shows the corresponding old document 1 6' that includes 

10 the corresponding identified portion 36\ Alternatively, if 
the user selects the selectable link 34°, which is labeled 
as B SAMC83°, then, as shown in Fig. 6, the display 22 
shows the corresponding old document 16" that in- 
cludes the corresponding identified portion 36°. 

15 [0026] In one embodiment of this invention, when a 
selectable link of the currently displayed document 20 
or 16 is selected, the corresponding old document 16 is 
displayed as the new currently displayed document The 
corresponding old document 16 is displayed with its se- 

20 lectable links 34 displayed in the margin. Thus, in this 
case, the old document 1 6 has become the currently dis- 
played document and the displayed selectable links 34 
link the displayed old document 16 to other previously 
read and annotated documents 16. In this manner, a us- 

25 er of this invention can follow a trail of links to jump from 
document to document to understand a topic. Accord- 
ingly, in this embodiment, if the old document 16 has 
existing selectable links 34, its selectable links 34 can 
be displayed. Furthermore, it can be updated with addi- 

30 tional selectable links to subsequently read documents. 
[0027] Fig. 7 shows a block diagram of one preferred 
embodiment of the processor 12 of this invention. As 
shown in Fig. 7, the processor 12 is preferably imple- 
mented using a general purpose computer 100. The 

35 general purpose computer 100 preferably includes a 
controller 110, a segmenting system 120, a selecting 
system 130, a clustering system 140 and an identifying 
system 150. These elements ofthe general purpose 
computer 100 are interconnected by a bus 160. 

40 [0028] The segmenting system 1 20 and the clustering 
system 140, controlled by the controller 110, are used 
to implement the flowchart shown in Fig. 2. The seg- 
menting system 1 20 and the selecting system 1 30, con- 
trolled by the controller 110, are used to implement the 

46 flow chart shown in Fig. 3. It should be appreciated that 
the segmenting system 120, the selecting system 133, 
the clustering system 140 and the identifying system 
150 are preferably implemented as software routines 
running on the controller 11 0 and stored in a memory of 

so the general purpose computer 100. It should be appre- 
ciated that many other implementations of these ele- 
ments will be apparent to those skilled in the art 
[0029] It should be understood that the term annota- 
tion as used herein is intended to include text digital ink, 

ss audio, video or any other input associated with a docu- 
ment. It should also be understood that the term "docu- 
ment" is intended to include a text document, a video 
document, an audio document and any other informa- 
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tion-storing document and any combination of informa- 
tion-storing documents. The term "document - is also in- 
tended to include passages from documents and is not 
to be limited to whole or entire documents. Further, it 
should be understood that the term "text" is intended to 
include text, graphic images, digital ink, audio, video or 
any other content of a document, including the docu- 
ment's structure. A document's structure is intended to 
include any divisible portion of a document such as a 
word, sentence, paragraph, section, chapter, volume, 
page, etc. 

[0030] The detailed description describes that the 
passages of new. documents are compared with por- 
tions or clusters of portions of previously read docu- 
ments to determine the similarity between them. This 
similarity analysis may be done with any number or type 
of similarity, relatedness or relevance algorithms. 
[0031] While this invention has been described with 
the specific embodiments outlined above, many alter- 
natives, modifications and variations are apparent to 
those skilled in the art. Accordingly, the preferred em- 
bodiments described above are illustrative and not lim- 
iting, various changes may be made without departing 
from the spirit and scope of the invention as defined in 
the following claims. 



Claims 

1 . A method for providing a selectable link in a display 
of a first document to at least one portion of at ieast 
one second document the method comprising: 

segmenting the first document into a plurality 
of passages; 

identifying at least one portion of the at least 
one second document having content similar to 
at least one of the plurality of passages; and 
displaying in the first document, for each such 
portion, a selectable link to the second docu- 
ment containing that identified portion. 

2. Tne method of claim 1 , further comprising: 

determining if a selectable Bnk is selected; and 
displaying at least one portion of the second 
document corresponding to the selected link. 

3. The method of claim 1 or claim 2, further compris- 
ing: 

segmenting each at least one second docu- 
ment into a plurality of portions; and 
storing the plurality of portions into ^memory 

4. The method of any of claims 1 to 3, wherein display- 
ing the selectable link comprises displaying each 
link in a margin of the first document proximate to 



the determined passage that is similar to an identi- 
fied portion. 

5. The method of any of claims 1 to 3 S wherein display- 
& ing the selectable linkcomprises displaying the pas- 
sage as the selectable link to the corresponding at 
least one portion. 

6. An apparatus that provides, in a display of a first 
io document, selectable links to at least one portion of 

at least one second document, the apparatus com- 
prising: 

a processing system, comprising: 

15 a segmenting system that segments the first 

document into a plurality of passages, and 
an identifying system that identifies at least one 
of a plurality of portions of the at least one sec- 
ond document is similar in content to at least 

20 one passage of the first document; and 

a display that displays the first document and 
at least one selectable link, each selectable link 
linking a passage ofthe first document to a cor- 
responding one of the at least one second doc- 

25 ument having at least one portion that is similar 

in content to that Segment. 

7. The apparatus of claim 6, wherein the processing 
system further comprises a selection device for se- 
lecting at least one of the at least one selectable 
link, the display displaying the corresponding at 
least one portion of the at least one second docw- 
nent based on the selected selectable link. 

8. The apparatus of claim 6 or claim 7, wherein the 
segmentffig system segments each at least one 
second document to generate the plurality of por- 

. lions. 

9. The apparatus of any of claims 6 to 8, wherein the 
identifying system identifies each similar portion 
based on a similarity of each passage to a cluster 
of the at ieast one portion. 

10. The apparatus of any of claims 6 to 9, wherein the 
display displays the at least one selectable link as 
the display of the passage that corresponds to the 
at least one portion. 
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